Access time of an adaptive random walk on the world-wide Web 
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We introduce and simulate the random walk that adapts move strategies according to local 
node preferences on a directed graph. We consider graphs with double-hierarchical connectivity and 
variable wiring diagram in the universality class of the world-wide Web. The ensemble of walkers 
reveals the structure of local subgraphs with dominant promoters and attractors of links. The 
average access time decays with the distance in hierarchy Aq as a power < taw >^ (Aq)~*. The 
access to highly connected nodes is orders of magnitude shorter compared to the standard random 
walk, suggesting the adaptive walk as an efficient message-passing algorithm on this class of graphs. 
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Networks pervade all science ||,^ making the prob- 
lem of understanding the structure and dynamics of com- 
plex networks the greatest challenge today. Cellular and 
metabolic networks chemical reactions [H||], social 
collaboration and science citation networks and 
the world-wide Web 1^ are examples of networks that 
can be characterized by random graphs with individ- 
ual dynamics and coupling architecture. In his inspiring 
paper Strogatz suggests that the inherent difhculty 
to understand networks lies in the intimate relationship 
between their structural complexity and evolution. To 
this end, the principle of universality that applies to sys- 
tems with scale-free structures can play an important 
role in revealing those properties of the dynamics that 
are relevant for the universal behavior of a network. The 
world-wide Web belongs to the class of directed graphs 
with double-hierarchical organization of node ranks ||] 
in which the wiring diagram rapidly changes in time. By 
exploiting the universality, the recently proposed model 
suggests minimum dynamic rules that are able to ac- 
count for the complexity of the Web. So far other systems 
in this universality class are not known. It is tempting 
to believe that certain catalytic reactions in open envi- 
ronment 1^ can be represented by dynamic processes on 
directed graphs with variable wiring diagram. 

Study of the response and relaxation of the network 
is the best way to understand how the structure affects 
function. Various dynamic processes, e.g., percolation 
and fragmentation by diluting links |§l(a), core percola- 
tion by stripping leaves |^ (b), and spreading of epidemics 
I p^ on networks of a given structure were examined. 

In this work we study the random-walk dynamics on 
a directed graph. In particular, we consider a double- 
hierarchical directed graph in the universality class of 
the world-wide web 0|, in which in- and out-links are 
governed by two distinct (and statistically dependent) 
power-law probability distributions. We introduce a ran- 
dom walker which copies its move strategies from the 
local node linking preferences. An ensemble of adap- 
tive random walkers efficiently scans the connected sub- 
graphs. The prominent feature of this class of hierarchi- 



cally connected graphs is that the average access time to 
a node at distance Aq measured by in- or out-ranks de- 
creases as a fractal power of the distance, indicating the 
presence of a peculiar structure with few dominant pro- 
moter and attractor nodes. The adaptive random walk 
suggests how an efficient message-passing algorithm can 
be constructed, that is driven by the properties inherent 
to this class of graphs. We demonstrate the advantages 
of the adaptive random walk by in parallel simulating the 
standard random walk on the same graph. 

We consider a directed graph evolving from the dy- 
namic rules which are recently suggested 0| to mimic 
growth of the world-wide Web (the Web graph). The ba- 
sic properties of the model are Q : (i) Directed linking, 
suggesting that at a node out- and in-links are not sym- 
metric; (j) Growth and rearrangements (updates of links) 
at a unique time scale; and (k) Bias update and bias at- 
tachment of links, with probabilities specified below. At 
each growth step a new node is added to the network 
and the number of links changes by amount AI{t). A 
fraction /o(t) = aM(t) of new links are outgoing links 
from the new added node i — t, whereas the remaining 
/i(t) = (1 — a)M{t) links are the updated links at other 
nodes in the network. Hence, the relevant parameter in 
the model is the ratio of updated and added links at each 
time step, i.e., f3 = fi{t)/fo{t) = (1 — a)/a, which is in- 
dependent of the actual increment M{t). Furthermore, 
the variations in M{t) are such that an average value 
AI = M{t) is finite, which can be considered as a con- 
stant in first approximation. In practice, the number of 
nodes and the number of links in the network increases 
with time, so that reasonable values for the average M 
are positive. For consistency, we keep M = 1 throughout 
this work (rendering reasonable computation time) pd| |. 

Bias activity of agents who create or update outgoing 
links from the Web pages (nodes) and bias (preferential) 
attraction of links can be formulated via following rules: 
At the growth step i an outgoing link is created from a 
node n < i with probability 



pi(n,i) 



aM + qoiu{n,i) 
(1 + a)M * i 



(1) 



1 



The link points towards the node k with the probabiHty 



aM + qin{k, i) 
(1 + a)M * i 



(2) 



where qout{n, i) and qin{k, i) are current number of outgo- 
ing and incoming hnks at respective nodes at the growth 
step i. It is assumed that at the time of addition of a node 
i to the network qoutihi) — <lin{hi) — 0. Therefore, the 
biasing in the dynamics is hnked to the time fluctuations 
of the node ranks. The effects of the attachment rule 
in Eq. (Q) to the distribution of in-degree was studied 
analytically in Ref. For the values of the control 

parameter (3 in the range < /3 < oo, corresponding to 
1 > a > 0, the network has the capability to rearrange 
its structure of links at the pace at which it grows. This 
property makes the Web substantially different from the 
networks that have static links. In our notation the net- 
works with fixed links, e.g., science citation network 
correspond to the limit /3 = (i.e., a = 1). 

As discussed in detail in Ref. the network that 
evolves according to the dynamic rules in Eqs. (|l|)-(|^) 
shows a complex topology of links in which nodes are ar- 
ranged hierarchically both according to ranks of outgoing 
and incoming links, in accordance with the data in the 
real Web Q. The cumulative probability distributions 
that describe node ranks are fOl 
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The corresponding scaling exponent is given by the 
exact result Ti„ = 2 -f- a, whereas Tout ~ 2 -I- 3a |^] 
is approximately linear for a < 0.66 and increases faster 
than linear for a ^ ac < 1. For a 1 the distribution of 
outgoing links loses the scaling behavior and approaches 
random distribution. By comparison with the measured 
distributions of outgoing and incoming links in the real 
world-wide Web |^ the parameter a is estimated as 0] 
a — 0.22 ± 0.1, leading to f3 in the range 3-4. 

The topology of links in Eq. (^) affects the character 
of the dynamic processes on the Web graph and its re- 
laxation properties. Next we study two types of random 
walks on the Web graph — a naive random walk and a 
walk with adaptive rules defined below. We first grow 
a network of N nodes using the evolution rules in Eqs. 
(^[H). The walk then starts at time t = from a ran- 
domly selected initial node, say node n. At this node 
we have qout(n) = qout{n.,N) outgoing links. (It is as- 
sumed that the network does not grow during the walk, 
although this restriction is not essential for the results.) 
A naive random walker selects its next move along one 
of the outgoing links of the node n with equal proba- 
bility W{^{n) = 1/qout{n), say the link pointing to the 
node k and moves there. At the next step it makes a 
similar selection among qout{k) links, and so on. In con- 
trast to this standard random walk rules, the adaptive 
random walker at each node selects the link with certain 



statistical weight. Here we assume that the weights are 
correlated with the linking preferences in Eq. of the 
visited node. In particular, the walker investigates tar- 
get nodes kg at the other end of each the outgoing link 
^, 1 < ^ < qout{n) of the visited node n and assigns the 
weights W£ to the corresponding links, where 

w, ^ p2{h, N) , = 1 ' (4) 

and p2{n,i = N) is given in Eq. (H) with the normaliza- 
tion in Eq. (||). Thus, the adaptive walker uses the same 
principle of selection that applied earlier to linking from 
the visited node. It should be noted that the weights 
wi are not necessarily identical to linking probabilities, 
both because they are evaluated in fully grown network 
and normalized. It this way, the adaptive random walker 
(ARW) utilizes the full information about local archi- 
tecture of in- and out-degree of the graph, whereas the 
naive random walker (NRW) is driven exclusively by the 
out-degree distribution, thus exploiting only a part of 
the available information. The walk continues as long as 
qout{k) > at last visited node, and stops at a node with 
no out-hnks qout{k) — (border of the graph ]l^). 

Both ARW and NRW on the graph traverse along a 
connected path of nodes, that, in principle, is a subset of 
the set of all connected nodes (so called connected com- 
ponent, which is usually searched by the Web crawls). 
Similarly, the length of the walk is not equal to the depth 
of the connected component, because the walk can move 
backwards making loops of any size. Hence the random- 
walk path represents a local structure on the graph, that 
we discuss below. In the metabolic and catalytic reaction 
networks the path of the walk represents a possible re- 
laxation process between two states corresponding to the 
departing and final node of the walk, respectively. In this 
context a naive random walk can not be considered as a 
process of choice, given that the presence of enzymes or 
catalysts inevitably selects the preferred reaction, much 
similar to our adaptive random walk. 

It is interesting to define the distance traversed by a 
walker on the network in terms of the difference in node 
ranks Aq, in which the graph has a nontrivial topology. 
The distribution of such distances in principle depends 
on the time of the walk. In Fig. 1 we present the re- 
sults of time-integrated distribution W{Aq) of distances 
Aq both for in- and out-links. Two bottom curves on 
the main figure represent the distribution for the adap- 
tive random walk, whereas the curves above the dotted 
line are the corresponding distributions in the case of the 
naive random walk. As it is seen immediately from Fig. 
1 the connected subgraphs visited by an ensemble of ran- 
dom walkers have topology that can be described by the 
power-law distributions 

, (5) 



WiAqout) ~ (Aqout)-'-' ;W^(Aq„0 ~ (A(j,„)-''' 

with the distinct distribution of in- and out-degree, re- 
sembling the global structure of the graph in Eq. (^. For 
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the adaptive random walk the exponents Sin and 5out are 
close to Tin and Tout in Eq. (||), respectively, of the un- 
derlying graph structure (see inset to Fig. 1). In fact, 
the scaling exponents do depend on the size of the en- 
semble Na relative to the size of the network N. In the 
inset to Fig. 1 we have shown the exponents din and 
6out measured by the ARW ensemble of the same size 
{Na — 2 X 10^) as the distributions in the main Fig. 1, 
but at larger network N — 10^. This measurements re- 
sult in larger exponents compared to the slopes measured 
in smaller network N = 10^ (cf. main Fig. 1). The ex- 
ponents decrease with increasing ratio Na/N. When a 
large enough ensemble of the adaptive random walkers 
is used the structure of selected subgraphs approaches 
the underlying topology of the entire graph (see caption 
to Fig. 1). Therefore, an ensemble of the adaptive ran- 
dom walkers on the Web graph can be used as a search 
algorithm for the structure of connected subgraphs. 




FIG. 1. Time-integrated distributions W{Aq) of the dis- 
tances Ag measured in the node ranks plotted against the 
distance Ag for a naive random walk in-degree (open squares) 
and out-degree (open triangles), and for adaptive random 
walk in-degree (filled squares) and out-degree (filled trian- 
gles). Data are log-binned with bin ratio 1.1 . Parameters: 
a = 0.25, iV = 10*, iVa = 2 X 10^ Slopes of the curves 
are (top to bottom) ^ST'" = 1-16, SZ^ = 1-55, (JST™ = 2.11, 
SouT = 2.62. Slope of dotted line is -2. Inset: Scaling expo- 
nents of the ARW defined in Eq. (H) for out-degree (triangles) 
and in-degree (squares) for N = 10^, Na — 2x 10^ vs. a. Also 
shown are the exponents Tin (x) and Tout (+) defined in Eq. 
(I- 

In the case of naive random walk the distributions are 
qualitatively different from the global graph structure 
(see two top curves in main Fig. 1). While the distri- 



butions of in- and out-degree still differ each from other, 
the corresponding scaling exponents are 6in < 2 and 
Sout < 2. (Some consequences of this property will be 
discussed later.) The distributions W{Aq) shown in Fig. 
1 indicate that in the ensemble of naive random walk- 
ers several well connected nodes (i.e., for large Ag) are 
visited more frequently than in the ensemble of the same 
size made of the adaptive random walkers (frequency can 
differ up to four orders of magnitude for the simulated 
conditions, see Fig. 1). This suggests that a naive ran- 
dom walker is wasting time by walking in closed loops, 
which often pass through several highly connected nodes 
on the graph. 

Direct measurements of the access time support this 
conclusion. We measure the average access time for a 
given distance Ag in the ensemble of walkers correspond- 
ing to the distributions in Fig. 1. The results are shown 
in Fig. 2. The average access time for the naive random 
walker is generally higher than the one of the adaptive 
walker, the ratio reaching < > / < tyj >~ 0.5 x 10^ 
for large distances. The most remarkable feature of this 
class of networks is that the average access time decreases 
as a fractional power of the distance in hierarchy, i.e., 

< U >~ {Aq)-'^{t/l^q) , (6) 

for distances Aq below the cut-off. On the double- 
hierarchical graphs that we study here, the Eq. (||) ap- 
plies both for the random and for the adaptive walk, with 
different exponents Oarw and Onrw as shown in Fig. 2. 
This implies that decrease of the access time with the 
ranks differences is an essential feature of these graphs 
that can be understood in the following way. Consider 
a node of in-degree k. It can be directly linked to a 
node of degree k + Q. Majority of nodes have a link 
with rather large rank difference — linking to a domi- 
nant attractor in view of the rule (y) . The ARW, which 
is designed to follow such links, reaches quickly a locally 
dominant attractor. According to the fast decaying dis- 
tribution W{Aq) (we measured 6*™^ = 1.97 ±0.04) there 
is a small number of attractors in the area scanned by the 
ARW. Hence, the access to any other node, including a 
node with a large out-degree often goes via a dominant 
attractor. The nodes with a large out-degree have the 
capacity to disperse the links throughout the network, 
because the probability to link back to the attractor de- 
creases with the number of out-links (cf. Eq. (||) ), i.e., 
they act as the promoters of the dynamics. In Fig. 2 the 
average time to access a dominant promoter is 2-20 steps 
for the ARW, compared with ~ 200-600 steps for the 
NRW, suggesting that the naive random walker makes 
up to 300 cycles containing locally dominant attractor 
and promoter node. Evidences of such structure in the 
real Web were recently discussed in Ref . Q . 

The probability P{t) that a walk survives for t steps 
on the Web graph is a quantitative measure of relaxation 
of the graph. The simulated survival probability P{t) 
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shown in the inset to Fig. 2 indicate once again that 
the adaptive and naive random walk represent two types 
of relaxation processes. Although the fitted expressions 
are not definitive and require further theoretical analysis, 
they suggest that a random walk on this class of graphs 
corresponds to a stretch exponential relaxation, similar 
to relaxation in complex disordered systems, whereas the 
adaptive random walk dies off nearly exponentially. 
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FIG. 2. Average access time of adaptive random walker 
(buUets) and a naive random walker (squares) versus distance 
Aqout measured by out-degree ranks between departing and 
visited node. In both cases an ensemble of Na = 2 x 10^ 
walkers on the Web graph of A'' = 10* nodes was used. Inset: 
Survival probability P{t) of time steps t versus time steps for 
adaptive (bullets) and naive (squares) random walker on the 
Web graph simulated in the same conditions as the main fig- 
ure. Fitting lines are: (ARW) P(t) = t-° °^ exp {-t°-^^), and 
(NRW) P(t) =t+°-^^exp(-t'''*2). 



As potential applications of these results we can men- 
tion the processes of message-passing and infection- 
spreading on the Web graph. Assuming that an infec- 
tion can be transmitted with the walker, we find that 
both the complex architecture of the graph and the walk 
strategies are relevant for the spreading. Due to a slow 
relaxation and heavy tail of the distribution W{Aq) of 
visited nodes {Sin < 2) in the case of the naive random 
walkers the epidemics is likely to spread over the entire 
graph. With the adaptive random walkers, on the other 
hand, the affected area remains restricted, however, lo- 
cally dominant nodes are quickly affected. The adaptive 
random walk also offers an efficient algorithm of message 
passing to a given destination on the Web graph JlGf . 

To explore the complex structure of the Web graph 
we proposed an adaptive random walk that learns its 



move strategies from the time varying local dynamic rules 
of the graph itself. The walker has a short access time 
to dominant nodes on the graph and affects a restricted 
area — the properties that are relevant for the potential 
applications. The adaptive random walk is a good candi- 
date for a message-passing algorithm on the Web graph 
(and catalytic reactions in the same universality class), 
which builds its efficiency on fully exploiting the local 
graph structure with double-hierarchical connectivity. 



ACKNOWLEDGMENTS 



This work was supported by the Ministry of Educa- 
tion, Science and Sports of the Republic of Slovenia. I 
thank to Vyatcheslav Priezzhev for correspondence. 



[1] 
[2] 

[3] 



[4] 
[5] 



[6] 



[7] 
[8] 



[9] 
[10] 

[11] 



S. H. Strogatz, Nature 410, 268 (2001). 

See series of articles "Beyond reductionism" in Science 

284, 79 (1999). 

P. M. Gleiss, P. F. Stadler, A. Wagner, and D. A. Fell, 



cond-mat/0009124| ; H. Jeong, B. Tombor, R. Albert, Z. 
N. Oltvai, and A. Barabasi, Nature 407, 651 (2000). 
M. E. J. Newman, Proc. Natl. Acad. Sci. 98, 404 (2001). 
S. Redner, Eur. Phys. J. B 4, 131 (1998). Graphs in this 
universality class are well described by the model in Ref. 
A.-L. Barabasi, R. Albert, and H. Jeong, Physica A 272, 
173 (1999). 

For a recent reference on the Web structure see (a) 
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, 
R. Sridhar, R. Stata, A. Tomkins, and J. Wiener, 
Computer Networks 33, 209 (2000), and search al- 
gorithms (b) S. Lawrence and C. L. Giles, Nature 
400, 107 (1999), (c) L. A. Adamic, R. M. Lukose , 



A. R. Puniyani, and B. A. Huberman, |cs.NI/0103016| , 
(d) J. M. Kleinberg, R. Kumar, P. Raghavan, S. 
Rajagopalan, and A. S. Tomkins, The Web as a 
graph: measurements, models, and methods, available 



on http : // www.almaden . ibm. com /cs /k53 / clever / 



B. Tadic, Physica A 293, 273 (2001); |cond-mat/0011442 . 
Reaction with pouring reactants and removing the prod- 
ucts with a constant rate was proposed recently to study 
work produced by molecular DNA motors, B. Yurke et 
al, Nature, 406, 605 (2000). 

(a)D. S. Callaway, M. E. J. Newman, S. H. Strogatz, D. 
J. Watts, Phys. Rev. Lett. 85, 5468 ( 2000); (b) M. Bauer 



and O. Golinelh, |3ond-mat/0102011 
R. 



:ond-mat/010202S 



Pastor-Satorras and A. Vespignani 
C. Moore and M. E. J. Newman, Phys. Rev. E 61, 5678 
(2000). 

The actual number of links in the Web exceeds number of 
nodes suggesting that a larger M would be more realis- 



4 



[12] 
[13] 



tic. In the model the universal properties of the network 
in the scaling region are not affected when M is varied. 
S.N. Dorogovtsev, J.F.F. Mendes, and A.N. Samukhin, 
Phys. Rev. Lett. 85, 4633 (2000). 

Recently it was demonstrated that two separate distri- 
butions of out- and in-degree are necessary in order to 
describe structural complexity of the Web graph, such as 
the occurrence of the giant connected component, see M 
E. J. Newma n, S. H. Strogatz, and D. J. Watts, cond- 
mat/0007235 and Rcf. M. Thus, the models based on 



the in-degree distribution alone do not satisfy minimum 
necessary requirements for the dynamics of the Web. 
[14] B. Ballobas, Modern Graph Theory, Springer, Berlin, 
1998. 

[15] Topical search algorithms in reference (d) in ^ reveal a 
similar structure with the authority-and-hub nodes. 

[16] Recently several other search algorithms that utilize local 
information in a power-law graph with symmetric links 
were proposed, see reference (c) in [^. It was shown that 
search time scales sublinearly with the network size. 



5 



